Keywords: Belief Propagation, Graphical Models 



Convergent and Correct Message Passing Schemes for 
Optimization Problems over Graphical Models 



O 

(N 



Nicholas Ruozzi 

Department of Computer Science 
Yale University 
New Haven, CT 06520, USA 
Email: nicholas.ruozzi@yale.edu 

Sekhar Tatikonda 

Department of Electrical Engineering 
Yale University 
New Haven, CT 06520, USA 
Email: nicholas.ruozzi@yale.edu 



NICHOLAS . RUOZZI@YALE.EDU 



SEKHAR.TATIKONDA@YALE.EDU 



Editor: 



C/2 



(N 
> 

m 
(N 
cn 

(N 
O 
O 



X 



Abstract 

The max-product algorithm, which attempts to compute the maximizing assignment of 
a given objective function, has recently found applications in quadratic minimization and 
combinatorial optimization. Unfortunately, the max-product algorithm is not guaranteed 
to converge and, even if it does, is not guaranteed to produce the optimal assignment. In 
this work, we provide a simple derivation of a new family of message passing algorithms. 
We first show how to arrive at this general message passing scheme by "splitting" the 
factors of our graphical model, and then we demonstrate that this construction can be 
extended beyond integral splitting. We prove that, for any objective function that attains 
its maximum value over its domain, this new family of message passing algorithms always 
contains a message passing scheme that guarantees correctness upon convergence to a 
unique estimate. Finally, we adopt an asynchronous message passing schedule and prove 
that, under mild assumptions, such a schedule guarantees the convergence of our algorithm. 



1. Introduction 



Belief propagation was originally formulated by Judea P earl as a dis tributed algorithm to 
perform statistical inference on probability distributions Pearll ( 19821 ). His primary obser- 
vation was that computing marginals is, in general, an expensive operation. However, if the 
probability distribution can be written as a product of smaller factors that only depend on 
a small subset of the variables then one could possibly compute the marginals much faster. 
This "factorization" is captured by a corresponding graphical model. Pearl demonstrated 
that, when the graphical model is a tree, the belief propagation algorithm is guaranteed to 
converge to the exact marginals of the input probability distribution. If the algorithm is 
run on an arbitrary graph that is not a tree, then neither convergence nor correctness are 
guaranteed. 
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Max-product, which Pearl dubbed behef revision, is a variant of the behef propagation al- 
gorithm where the summations are replaced by maximizations. The goal of the max-product 
algorithm is to compute the assignment of the variables that maximizes a given objective 
function. In general, computing such an assignment is an NP-hard problem, but for graph- 
ical models possessing a single cycle the algorithi n is gu a rantee d to converge to the max- 
imizing assignment under a few mild assumptions Weisd ( 200d) . Over arbitrary graphical 
models, the max-product algorithm may fail to c onverge jMalioutov et al.' tood ) or, worse, 
may converge to an assignment that is not optimal I Weiss and Freeman (200l|)7 Despite these 
difficulties, max-product and its variants have found empirical succes s in a variety of ap - 
plication areas including s t atistical physics , comb i natorial optimization iBayati et al.l (|2005l ) 



Sanghavi and Shahl mm ISa nghavi et al.l jioO^) iRuozzi and Tatikondal (120081). computer 



vision, clustering iFrev and P.. (2 007), error-correcting co des Berrou et al. ( 19931 ). and the 
minimization of convex functions iMoallemi and Van Rov ( 2007 ): however, rigorously char- 
acterizing their behavior outside of a few well-structured instances has proved challenging. 

In order to resolve the difficulties presented by the standard max-product algorithm, 
several alternate message passing schemes have been proposed to compute th e maxi mizing 
assignment over arbitrary graphic al models: MPLP Globerson and Jaakkola ( 20071 ). tree- 
reweigh t ed ma x-product (TRMP) IWainwright et al.l ()2005l ). and max-sum diffusion (MSD) 
Wernei] (|2007l ). Recently all of these algorithms were shown to be members of a class of 



"bound minimizing" algorithm s for which, under a suitable update schedule, convergence 
of the algorithms is guaranteed Meltzer et al. ( 20091 ) . 

The TRMP algorithm is the max-product analog of the tree-reweighted belief propaga- 
tion algorithm (TRBP). TRBP, like belief propagation (BP), is an algorithm designed to 
compute the marginals of a given probability distribution. The key insight that the TRMP 
algorithm exploits is the observation that the max-product algorithm is correct on trees. 
The TRMP algorithm begins by choosing a probability distribution over spanning trees of 
the factor graph and then rewrites the original distribution as an expectation over span- 
ning trees. With this simple rewriting and subsequent derivation of a new message passing 
scheme, one can show that, for discrete state spaces, TR MP guarantees correctness upon 
convergence to a unique estimate IWainwright et al.l (120051). These results w ere expanded in 
subsequent works Kolmogorov and Wainwright ( 20051 ) Kolmogorov ( 20061 ). an d recently, a 
serial version of TRMP denoted TRW-S was shown to be provably convergent iKolmogorov 
(|2006l ). 

The MPLP algorithm is derived from a special form of the dual linear programming 
relaxation of the maximization problem. Over discrete state spaces, the algorithm is guar- 
anteed to converge and is correct upon convergence to a unique estimate. Unlike the TRMP 
algorithm, the MPLP algorithm does not require a choice of parameters. Because choosing 
the constants for TRMP does require some care, the MPLP algorithm may seem prefer- 
able. However, as we will demonstrate by example, the constants provide some flexibility 
to overcome bad behavior of the algorithm. For example, there are applications over con- 
tinuous state spaces for which the choice of constants is critical to convergence and cor- 
rectness. One such example is the quadratic minimization problem. For this application, 
there exist positive definite matrices for which the TRMP message passing scheme does 
not c onverge to the correct i ninini izing, regardless of the chosen distribution over spanning 



trees 



Ruozzi and Tatikonda (I2OI0I ). 
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We propose a new message passing scheme for the solving the maximization problem 
based on a simple "splitting" heuristic. Our contributions include: 

• A simple and novel derivation of our message passing scheme for general factor graphs. 

• A message passing schedule and conditions under which our algorithm converges. 

• A simple choice of the parameters such that if each of the beliefs bi{xi) has unique 
argmax then the output of the algorithm is a local optimum of the objective function. 

• A simple choice of the parameters such that if each of the beliefs bi{xi) has unique 
argmax then the output of the algorithm is a global optimum of the objective function. 

• Conditions under which the algorithm cannot converge to a unique globally optimal 
estimate. 

Unlike MPLP, TRMP, and MSD, the derivation of this algorithm is surprisingly simple, 
and the update rules closely mirror the standard max-product message updates. Because of 
its simplicity, we are able to present the algorithm in its most general form: our algorithm 
is not restricted to binary state spaces or pairwise factor graphs. More importantly, almost 
all of the intuition for the standard max-product algorithm can be extended with very little 
effort to our framework. 

Like TRMP, our algorithm requires choosing a set of constants. Indeed, TRMP can 
be seen as a special case of our algorithm. However, unlike TRMP, any choice of non-zero 
constants will suffice to produce a valid message passing algorithm. In this way, our message 
passing scheme is more appropriately thought of as a family of message passing algorithms. 
We will show that, assuming the messages passed by the algorithm are always finite, there is 
always a simple choice of constants that will guarantee convergence. Further, if we are able 
to extract a unique estimate from the converged beliefs then this estimate is guaranteed to 
be the maximizing assignment. 

The outline of this paper is as follows: in Section[2]we review the max-product algorithm 
and other relevant background material, in Section [3] we derive a new passing passing 
algorithm by splitting factor nodes and prove some basic results, in Section H] we explore 
the local and global optimality of the fixed points of our message passing scheme, in Section 
[5] we provide an alternate message passing schedule under which the algorithm is guaranteed 
to converge and demonstrate that the algorithm cannot always produce a tight lower bound 
to the objective function, in Section[6]we show how to strengthen the results of the previous 
sections for the special case in which the alphabet is binary and the factors are pairwise, 
and we conclude in Section [71 

2. Preliminaries 

Before we proceed to our results, we will briefiy review the relevant background material 
pertaining to message passing algorithms. The focus of this paper will be on solving min- 
imization problems for which we can write the objective function as a sum of functions 
over fewer variables. These "smaller" functions are called potentials. We note that this is 
equivalent to the problem of maximizing a product of non-negative potentials, as we can 
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Figure 1: Factor graph corresponding to f{xi,X2,X3) = (pi + 4>2 + (ps + ^"12 + ^'23 + ipi3- 
By convention, variable nodes are represented as circles and factor nodes are represented 
as squares. 



convert the maximum over a product of potentials into a minimum over a sum by taking 
negative logs. Although the max-product formulation is more popular in the literature, for 
notational reasons that will become clear in the sequel, we will use the min-sum formulation. 

Let / : X"' MU{oo}, where X is an arbitrary set (e.g. M, {0, 1}, Z, etc.). Throughout 
this paper, we will be interested in finding an element G A'" that minimizes 

/, and as such, we will assume that there is such an element. For an arbitrary function, 
computing this minimum may be difficult, especially if n is large. The basic observation 
of the min-sum algorithm is that, even though the original minimization problem may 
be difficult, if / can be written as a sum of functions depending on only a small subset 
of the variables, then we may be able to minimize the global function by performing a 
series of minimizations over (presumably easier) sub-problems. To make this concrete, let 
A C ' We say that / factorizes over A if we can write / as a sum of real valued 
potential functions (pi : X ^ WU {oo} and ^pa '■ ^ M U {oo} as follows: 

fix) = (pi{Xi) + ^ IpaiXa) (1) 
i aeA 

Every factorization of / has a corresponding graphical representation known as a factor 
graph. The factor graph consists of a node i for each variable Xi and a factor node a for each 
of the factors ipa with an edge joining the factor node corresponding to a to the variable 
node representing if z S a. For a concrete example, see Figure [H The min-sum algorithm 
is a message passing algorithm on this factor graph. In the algorithm, there are two types of 
messages: messages passed from variable nodes to factor nodes and messages passed from 
factor nodes to variable nodes. On the t^^ iteration of the algorithm, messages are passed 
along each edge of the factor graph as follows: 

m*^„(xi) = K + (pi{xi)+ ^y^iixi) (2) 

fSedi\a 

m^^j(xi) = K + min ipaixa) + ^ "i*^lj^(xfc) (3) 



kGa\i 



where di denotes the set of all a G ^ such that z € a (intuitively, this is the set of neighbors 
of variable node Xi in the factor graph), Xa is the vector formed from the entries of x by 
selecting only the indices in a, and a\i is abusive notation for the set-theoretic difference 
a\{i}. 
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Each message update has an arbitrary normaUzation factor n. Because k is not a 
function of any of the variables, it only affects the vahic of the minimum and not where 
the minimum is located. As such, we are free to choose it however we like for each message 
and each time step. In practice, these constants are used to avoid numerical issues that 
may arise during execution of the algorithm. We will think of the messages as a vector of 
functions indexed by the edge over which the message is passed. 

Definition 1 A vector of messages m = {ma^i^rrii-^a} is finite if for all a G A, E a, 

and G X, \ma^i{xi)\ < oo and \mi-^a{xi)\ < oo. 

Any vector of finite messages is a valid choice for the vector of initial messages mP, but 
the choice of initial messages can greatly affect the behavior of the algorithm. A typical 
assumption is that the initial messages are chosen such that m^^- = and = 0. 

We want to use the messages in order to construct an estimate of the min-marginals 
of /. A min- marginal of / is a function of one variable obtained by minimizing the func- 
tion / over all of the remaining variables. The min- marginal for the variable Xi would 
be min^j/.^/^j^^, f{x') which is a function of .Tj. Given any vector of messages, m*, we can 
construct a set of beliefs that are intended to approximate the min-marginals of /: 

bl{xi) = K + (l)i{xi) + ^ mi^iixi) (4) 

biiXa) = K + tpaiXa) +^ml^^{Xi) (5) 

If bi{xi) = m.iyi^i.j.i-j.. f{x'), then for any G argmin^;. ^^(xi) there exists a vector 
X* such that x* = yi and x* minimizes the function /. If the | argmiuj,. 6j(a:;i)| = 1 for 
all i, then we can take x* = y, hnt, if the objective function has more than one optimal 
solution, then we may not be able to construct such an x* so easily. For this reason, one 
commonly assumes that the objective function has a unique global minimum. Although 
this assumption is common, we will not adopt this convention in this work. Unfortunately, 
because our beliefs are not necessarily the true min-marginals, we can only approximate 
the optimal assignment by computing an estimate of the argmin: 

xj e argmin 6* (xj) (6) 

Xi 

Definition 2 A vector, b, of beliefs admits a unique estimate, x* , if x^ G argmiuj,. bi{xi) 
and the argmin is unique for each i. 

If the algorithm converges to a collection of beliefs from which we can extract a unique 
estimate x*, then we hope that the vector x* is indeed a global minimum of the objective 
function. 

2.1 Computation Trees 

An important tool in the analysis of the min-sum algorithm is the notion of a computation 
tree. Intuitively, the computation tree is an unrolled version of the original graph that 
captures the evolution of the messages passed by the min-sum algorithm needed to compute 
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Figure 2: The computation tree at time t = 4 rooted at the variable node xi of the factor 
graph in Figure [TJ The variable nodes have been labeled with their potentials for emphasis. 



the belief at time t at a particular node of the factor graph. Computation trees describe 
the evolution of the beliefs over time, which, in some cases, can help us prove correctness 
and/or convergence of the message passing updates. 

The depth t computation tree rooted at node i contains all of the length t non-backtracking 
walks in the factor graph starting at node i. For any node v in the factor graph, the compu- 
tation tree at time t rooted at v, denoted by Ty{t), is defined recursively as follows: T^,(0) is 
just the node v, the root of the tree. The tree Ty{t) at time t is generated from Ty{t — 1) by 
adding to each leaf of Ty{t — 1) a copy of each of its neighbors in G (and the corresponding 
edge), except for the neighbor that is already present in Ty{t — 1). Each node of Ty{t) is a 
copy of a node in G, and the potentials on the nodes in Ty{t), which operate on a subset 
of the variables in Ty{t), are copies of the potentials of the corresponding nodes in G. The 
construction of a computation tree for the graph in Figure [1] is pictured in Figure [2j Note 
that each variable node in (t) represents a distinct copy of some variable xj in the original 
graph. 

Given any initialization of the messages, T„(t) captures the information available to 
node V at time t. At time t = 0, node v has received only the initial messages from its 
neighbors, so Ty{l) consists only of v. At time t = 1, v receives the round one messages 
from all of its neighbors, so u 's neighbors are added to the tree. These round one messages 
depend only on the initial messages, so the tree terminates at this point. By construction, 
we have the following lemma: 

Lemma 3 The belief at node v produced by the min-sum algorithm at time t corresponds 
to the exact min-marginal at the root of Ty[t) whose boundary messages are given by the 
initial messages. 



Proof See, for example, Tatikonda and Jordan ( 20021 ) and Weiss and Freeman ( 200ll ) 



2.2 Fixed Point Properties 

Computation trees provide us with a dynamic view of the min-sum algorithm. After a finite 
number of time steps, we hope that the beliefs on the computation trees stop changing. In 
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practice, when the behefs change by less than some small amount, we say that the algorithm 
has converged. If the messages of the min-sum algorithm converge then the converged 
messages must be fixed points of the message update equations. 

Ideally, the converged beliefs would be the true min-marginals of the function /. If the 
beliefs are the exact min-marginals, then the estimate corresponding to our beliefs would 
indeed be the global minimum. Unfortunately, the algorithm is only known to produce the 
exact min-marginals on special factor graphs (e.g. when the factor graph is a tree). Instead, 
we will show that the fixed point beliefs are almost like min-marginals. Like the messages, 
we will think of the beliefs as a vector of functions indexed by the nodes of the factor graph. 
Consider the following definitions: 

Definition 4 A vector of beliefs, b, is admissible for a function f if 

fix) = K + '^bi{xi) +^ - ^^bk{xk) 

Definition 5 A vector of beliefs, b, is min- consistent if for all a and all i G a: 

mmba{xa) = K + bi{xi) 

Any vector of beliefs that satisfies these two properties provides a meaningful reparame- 
terization of the original objective function. We can show that any vector of beliefs obtained 
from a fixed point of the message updates does indeed satisfy these two properties: 

Theorem 6 For any vector of fixed point messages, the corresponding beliefs are admissible 
and min- consistent. 



Proof See IWainwright et al.l (|2004l ). Proposition 2 and lemmas [7] and [8] below. 



For any objective function / such that for all x, \f{x)\ < oo, there always exist s a fix ed 
point of the min-sum message passing updates (see Theorem 2 of lWainwright et al.l (|2004l )). 
Moreover, the min-sum algorithm is guaranteed to converge to the correct solution on factor 
graphs that are trees. However, convergence an d correctness for a r bitrary factor graph s has 
only been demonstrated for a few special cases IWainwright et al.1 (|2004l ) IWeissI (|2000l '). 



3. A General Splitting Heuristic 

In this section, we introduce a family of message passing algorithms parameterized by a 
vector of reals. The intuition for this family of algorithms is simple: given any factorization 
of the objective function /, we can split any of the factors into several pieces and obtain a 
new factorization of the objective function /. The standard notation masks the fact that 
each of the potentials may further factorize into smaller pieces. For example, suppose we 
are given the objective function f{xi,X2) = xi+ X2 + xiX2- There are many different ways 
that we can factorize /: 



f{xi,X2) 



Xl + X2+ XlX2 



(7) 
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Figure 3: New factor graph formed by splitting the i/'is potential of the factor graph in 
Figure [1] into two potentials. 



= Xl + (X2 + Xl2;2) (8) 
= (Xl + X2 + Xl2;2) (9) 
= Xi+X2 + ^ + ^ (10) 

Each of these represents a factorization of / into a different number of potentials (the paren- 
thesis indicate a single potential function). All of these can be captured by the standard 
min-sum algorithm except for the last. Recall that A was taken to be a subset of 2'f^' ' '"J'. 
In order to accommodate the factorization given by Equation \T0\ we will now allow A to 
be a multiset over the set 2{i-'">. We can then construct the factor graph as before with a 
distinct factor node for each element of the multiset A. We can use the standard min-sum 
algorithm in an attempt to compute the minimum of / given this new factorization. 

We could, of course, rewrite the objective function in many different ways. However, 
arbitrarily rewriting the objective function could significantly increase the size of the factor 
graph, and such rewriting may not make the minimization problem any easier. In this paper, 
we will focus on one special rewriting of the objective function. Suppose / factorizes over 
A as in Equation [H Let G be the corresponding factor graph. Suppose now that we take 
one potential a (z A and split it into k potentials ai, ■■■,ak such that for each j € {l...k}, 
Tpaj = This allows us to rewrite the objective function, /, as 

fix) = J2MX^) + Y.M^P) (11) 

i l3eA 

i l3eA\a j=l 

k 

= ^4>i(.Xi)+ ^ 'iPlBiXlB) + ^'4'aj{Xa) (13) 
i l3eA\a j=l 

This rewriting does not change the objective function, but it does produce a new factor 
graph F (see Figure [3j). Now, take some i G a and consider the messages TTii^Q^. and Tn^-^.^i 
given by the standard min-sum algorithm: 



mi 



,a,ixi) = ii+Mxi)+ Yl "^5j4*(^o (14) 

l3£dFi\aj 
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Figure 4: New factor graph formed from the factor graph in Figure[l]by sphtting the variable 
node xi into two variables X4 and X5. The new potentials are given hy (p4 = (p^ = 

1p245 = V'l2(a;4,a;2) - log{x4 = X5}, and V345 = A3{X4,X3) - log{x4 = X5}. 



= /. + min^^+ Y: (15) 

k(^aj\i 

where dpi denotes the neighbors of i in F. Notice that there is an automorphism of the 
graph that maps ai to aj. As the messages passed from any node only depend on the 
messages received at the previous time step, if the initial messages are the same at both 
of these nodes, then they must produce identical messages at time 1. More formally, if 
we initialize the messages identically over each split edge, then, at any time step t > 0, 
m\_^^.{xi) = m\^^^{xi) and m^^,^j(xi) = m^^_^j(xi) for any / G {!,..., A;} by symmetry (i.e. 
there is an automorphism of the graph that maps a/ to Uj). Because of this, we can rewrite 
the message from i to Uj as: 



m\^a,{xi) = K + <t)i{xi)+ ^ rn*i3^^{xi) (16) 
= K + Uxi) + {k-lW~^_^^{xi)+ Y ^I'-lii^i) (18) 

l3£dGi\a 

Notice that Equation [18] can be viewed as a message passing algorithm on the original 
factor graph. The primary difference then between Equation [18] and the standard min-sum 
updates is that the message passed from z to a now depends on the message from a to i. 

Analogously, we can also split the variable nodes. Suppose / factorizes over A as 
in Equation [TJ Let G be the corresponding factor graph. Suppose now that we take one 

variable Xi and split it into k variables Xj^, ...,Xi^. such that for each / G {1...A;}, 4>i^ = ' . 
This produces a new factor graph, F. Because Xj^, Xj^. are all the same variable, we must 
add a constraint to ensure that they are indeed the same. Next, we need to modify the 
potentials to incorporate the constraint and the change of variables. We will construct Ap 
such that for each a ^ A with z G a there is a /3 = (a \ i) U {ii, in Ap- Define 

i^pi^p) = ipa{xa\i,Xi-^) — logjxj^ = ... = Xj^. } where {xj^ = ... = Xj^.} is the 0-1 indicator 
function for the equality constraint. For each a A with i ^ a we simply add a to 
with its old potential. For an example of this construction, see Figure H] This rewriting 
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produces a new objective function 

k 



g{x) = + E (19) 



F 



Minimizing g is equivalent to minimizing /. Again, we will show that we can collapse 
the min-sum message passing updates over F to message passing updates over G with 
modified potentials. Take some a (z Ap containing the new variable ii which augments 
the potential 7 € ^ and consider the messages rrii-^^p and mi^^i^ given by the standard 
min-sum algorithm: 



t ( \ I 4'i\^ii 



- + ^+ E -Kn(-n) (20) 



l3£dFii\a 



K+mimpa{xa)+ ^ rnl^^{xk) (21) 



Again, if we initialize the messages identically over each split edge, then, at any time 
step t > 0, m\^^^{xi) = m\^^^{xi) and ml^^i^{xi) = m^^.^(xi) for any I E {l,...,fc} by 
symmetry. Using this, we can rewrite the message from a to ii as: 

= K + min Tpaixa) + ^ rn^T^ai^k) (22) 

fcGa\'«i 

= K+ mimp-y{xa,Xi^) -log{xi^ = ... = XiJ + ^ mlzlai^k) (23) 

k(£a\ii 

= K+ min V'7(xa,XjJ - log{xi-^ = ... = Xi^} 

+ E <"-ia(^o + E <->'«(^^) (24) 

Ij^l k€^\i 

= K + min V7(xa, J + rn^i^Xai^^i) + E ^k~-lai^k) (25) 
"""^'1 1^1 fce7V 

= K+ min 'ipy{xa,Xi-^) + {k - l)m*^_^„(xij + ^ ^^^^^(xfc) (26) 

By symmetry, we only need to perform one message update to compute "T-Q^j^ (a^i, ) for 
each / € {1,...,A:}. As a result, we can think of these messages as being passed on the 
original factor graph G. 

The combined message updates for each of these splitting operations are presented in 
Algorithm [TJ Observe that if we choose Cq, = 1 for each a and q = 1 for each i, then 
the message updates described in the algorithm are exactly the min-sum message passing 
updates described in the preliminaries. Rewriting the message updates in this way seems 
purely cosmetic, but as we will show in the following sections, the choice of the vector c can 
influence both the convergence and correctness of the algorithm. 

We define the beliefs corresponding to the new message updates as follows: 

blix,) = ^ + M^i) + ^c„mU(^.) (27) 
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Algorithm 1 Synchronous Sphtting Algorithm 



1: Initialize the messages to some finite vector. 

2: For iteration t = 1, 2, ... update the the messages as follows 



m^^j(xi) = K + min 

■^ct\i 



I3eai\a 



Compare these definitions with Equations U] and [SJ Notice that the bracketed expression 
in Equation [28] is the definition of nif^^a- As we will see in Lemma [71 if we define the beliefs 
in this way, then any vector of finite messages will produce a vector of admissible beliefs. 
The beliefs are still approximating the min- marginals of /, but each variable node has been 
split Cj times and each factor node has been split Cq, times. Applying Definition [4] to the 
new factor factor graph F, a vector of beliefs is admissible for our new message passing 
algorithm if 

fix) = + Yl [ba{xa)-Ybk{xk)] (29) 

I ii ai^Ap k£a 

= K+ ^ Ci6i(Xi) + ^ CQ,|^6Q,(Xa) - ^Cfc6fc(Xfc) (30) 
jg{l,...,n} a&A k€a 

Throughout this discussion, we have assumed that the vector c contained only positive 
integers. If we allow c to be an arbitrary vector of non-zero reals, then the notion of splitting 
no longer makes sense. Instead, we will think of the vector c as parameterizing a specific 
factorization of the function /. The definitions of the message updates and the beliefs are 
equally valid for any choice of non-zero real constants. In what follows, we will explore the 
properties of this new message passing scheme for a vector c of non-zero reals. 

As before, we want the fixed point beliefs produced by our message passing scheme to 
behave like min-marginals (i.e. they are min-consistent) and they produce a reparameter- 
ization of the objective function. Using the definitions above, we have have the following 
lemmas: 

Lemma 7 Let f factorize over A. If c is a vector of non-zero reals, then for any vector of 
finite messages, m, the corresponding beliefs are admissible. 

Proof Let m be the vector of messages given in the statement of the lemma and b the 
corresponding vector of beliefs. For any set of messages, we can rewrite the belief ha as: 



ho,{Xa) = K + ^"^^"'^ + X] '^fe (h{Xk) - ma^kiXk)) (31) 
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where this equahty is obtained by plugging the Equation [27] into Equation [28l Using this 
observation, we can easily verify that, up to an additive constant, the beliefs satisfy: 

Cibi{xi) + ^ Cq [^^^(xq) - ^ Ckbkixk) 

i a k£a 

= CibijXi) + Cq, [^ '^"(^"^ _ Ckma-^k{Xk)] 

= Yl + Yl CiCQmQ^i(xi)] + Y [V'aCa^a) - Yl 

{Xk)\ 

i a€di ct kGa 

= y^^(j)i{xi) + V^q-(Xq) 
i a 

where the last Hue follows by observing that J2i Y^aadi CiCa'ma^i{xi) = J2a Ylkea CkCama^k{xk). 
Notice that this subtraction of the messages only makes sense if the messages are finite val- 
ued. ■ 



The previous lemma guarantees that any vector of finite messages is guaranteed to be 
admissible, but an analogous lemma is not true for min-consistency. We require a stronger 
assumption about the vector of messages in order to ensure min-consistency: 

Lemma 8 Let m be a fixed point of the message updates in AlgorithmUl The corresponding 
beliefs, b, are min- consistent. 

Proof Up to an additive constant, we can write, 

mm6Q(xQ) = mm ^ Z^'^kl \- [Ca - l)ma^k{Xk) + cprnji^kyxki 

^ ^ kea I3&dk\a 



1pa{Xa) 

+ Yj CkfUk^aiXk) + CiTUi^aiXi) 



mm h 2^Cfcmfc_>Q(xfe) 

^Ci\i Cq 

mm 

^a\i Cqi . 

^a^i{xi) + ?n,j_j.Q,(xj) 



ma-,i(xi) + y (cq - l)mQ^j(xj) YZ c^m^^iixi) 



^iiXi) 



+ Y^ cpmp^i{xi 



Ci 



biixi) 



Again, for any objective function / such that for all x, |/(2;)| < oo, the re always exists a 



fixed point of the min-sum message passing updates (see Theorem 2 of IWainwright et al 



jiooi)). The proof of this statement can be translated almost exactly for our message 



passing updates, and we will not reproduce it here. 
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3.1 Computation Trees 

The computation trees produced by the synchronous spUtting algorithm are different from 
their predecessors. Again, the computation tree captures the messages that would need to 
be passed in order to compute bl{xi). However, the messages that are passed in the new 
algorithm are multiplied by a non-zero constant. As a result, the potential at a node u in 
the computation tree corresponds to some potential in the original graph multiplied by a 
constant that depends on all of the nodes above u in the computation tree. We summarize 
the changes as follows: 

1. The message passed from i to a may now depend on the message from a to i at the 
previous time step. As such, we now form the time t + 1 computation tree from the 
time t computation tree by taking any leaf u, which is a copy of node v in the factor 
graph, of the time t computation tree, creating a new node for every w € dv, and 
connecting u to these new nodes. As a result, the new computation tree rooted at 
node u of depth t contains at least all of the non-backtracking walks of length t in the 
factor graph starting from u and, at most, all walks of length t in the factor graph 
starting at u. 

2. The messages are weighted by the elements of c. This changes the potentials at the 
nodes in the computation tree. For example, suppose the computation tree was rooted 
at variable node i and that bi depends on the message from u to i. Because TnQ,—^i 
is multiplied by Cq, in bi, every potential along this branch of the computation tree 
is multiplied by c^. To make this concrete, we can associate a weight to every edge 
of the computation tree that corresponds to the constant that multiplies the message 
passed across that edge. To compute the new potential at a variable node i in the 
computation tree, we now need to multiply the corresponding potential — by each of 
the weights corresponding to the edges that appear along the path from i to the root 
of the computation tree. An analogous process can be used to compute the potentials 
at each of the factor nodes. The computation tree produced by the splitting algorithm 
at time t = 2 for the factor graph in Figure [U is pictured in Figure El Compare this 
with computation tree produced by the standard min-sum algorithm in Figure [2j 

If we make these adjustments and all of the weights are positive, then the belief, b\{xi), 
at node i at time t is given by the min- marginal at the root of Tj (t) . If some of the weights 
are negative, then b\{xi) is computed by maximizing over each variable in Ti{t) whose self- 
potential has a negative weight and minimizing over each variable whose self-potential has 
a non-negative weight. In this way, the beliefs correspond to marginals at the root of these 
computation trees. 

4. Optimality of Fixed Points 

Empirically, the standard min-sum algorithm need not converge and, even if it does, the 
estimate produced at convergence need not actually minimize the objective function. Up 
until this point, we have not placed any restriction on the vector c except that all of its entries 
are non-zero. Still, we know from the TRMP case that certain choices of the parameters 
are better than others: some ensure that the estimate obtained at a fixed point is correct. 
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(a) Message passing tree (b) New computation tree 

Figure 5: Construction of the computation tree rooted at node xi at time t = 2 produced by 
Algorithm [1] for the factor graph in Figure [TJ The message passing tree with edge weights 
corresponding to the constant that multiplies the message passed across the edge (left) is 
converted into the new computation tree (right). Notice that the potentials in the new 
computation tree are now weighted by elements of the parameter vector c. 



From the previous section, we know that the fixed point beliefs produced by Algorithm 
[1] are admissible and min-consistent. From these fixed point beliefs, we construct a fixed 
point estimate x* such that x* G argminbj. If the objective function had a unique global 
minimum and the fixed point beliefs were the true min-marginals, then x* would indeed 
be the global minimum. Now, suppose that the hi are not the true min-marginals. What 
can we say about the optimality of any vector x* such that x* E argminftj? What can we 
say if there is a unique vector x* with this property? Our primary tool for answering these 
questions will be the following lemma: 

Lemma 9 Let h he a vector of min-consistent heliefs. If there exists a unique estimate x* 
that minimizes hi{xi) for each i, then x* also minimizes ha{xa) and, for any i £ a, x* 
minimizes haixa) — hi{xi). 

Proof Because the beliefs are min-consistent for any i £ a, we have: 

min 6q,(xq,) = k + hixi) 

From this, we can conclude that there is a some x^ that minimizes h(^ with Xi — x^. 
Further, because the minimum is unique for each hi, x* must minimize ha- Now fix a vector 
X and consider 

ba{Xa) - hi{x*) = min (x* , ) - (x* ) 
= min ha{xi, x^/i ) -hi{xi) 

< ha{Xa) - hi{Xi) 



This lemma will be a crucial building block of many of the theorems in this paper, 
and many variants of t his lemma have be en proven in the l iterature (e.g. Lemma 4 in 
Wainwright et al.l (|2004l l and Theorem 1 in IWeiss et~all (|2007l )). 



14 



Using this lemma and the observation of Lemma [7] that / can be written as a sum of 
the behefs, we can convert questions about the optimaUty of the vector x into questions 
about the choice of parameters. We will show how to choose the q and Cq, such that we 
will be guaranteed some form of optimality for a collection of admissible and min-consistent 
beliefs. 



4.1 Local Optimality 

A function / has a local optimum at the point x G if there is some neighborhood of 
X such that / does not increase in that neighborhood. The definition of neighborhood is 
metric dependent, and in the interest of keeping our results applicable to a wide variety 
of spaces, we will choose the metric to be the Hamming distance. For any two vectors 
x,y (z X"^, the Hamming distance is the number of entries in which the two vectors differ. 
For the purposes of this paper, we will restrict our definition of local optimality to vectors 
within Hamming distance one: 



Definition 10 j; G X^ is a local minimum of the objective function, f , if for every vector 
y that has at most one entry different from x, f{x) < f{y)- 



Our n otion does not nec e ssaril y coincide with other notions of local optimality from the 
literature Wainwright et al. ( 20041 ) . If the standard min-sum algorithm converges to unique 
estimate, then x* is locally optimal in the following sense: x* is a global minimum of the 
reparameterization when it is restricted to factor - induc ed subgraphs of the factor graph 



that contain exactly one cycle Wainwright et al. ( 20041 ) . However, x* is not necessarily 



a global optimum of the objective function. Suppose the factor graph consists of only 
pairwise factors. In this case, the collection of nodes formed by taking some variable node 
i and every node in its two-hop neighborhood must be a tree, T. The restriction of the 
reparameterization to this tree is given by: 



R{xt) = ^^hj{Xj) + ^^[ha{xa) -^^hk{xk)] 



(32) 



R contains every part of the reparameterization that depends on the variable Xi, and x* 
minimizes R. As a result, we observe that if we change only the value of x*, then we cannot 
decrease the value of R and, conseque ntly, we cannot decrease t he objective function. In this 
case, the local optimality condition in Wainwright et al. ( 20041 ) does imply local optimality 
in our sense. However, if the factorization is not pairwise, then the two-hop neighborhood of 
any node is not necessa r ily cy cle free (see Figure [6]) . Consequently, the notion of optimality 
from Wainwright et al. ( 20041 ) need not correspond to Definition [10] for graphs where the 
factorization is not pairwise. 

We will show that there exist choices of the parameters for which any fixed point esti- 
mate extracted from a vector of admissible and min-consistent beliefs that simultaneously 
minimizes all of the beliefs is guaranteed to be locally optimal with respect to the Hamming 
distance. In order to prove such a result, we first need to relate the minima of the fixed 
point beliefs to the minima of the objective function. By Lemma [TJ the objective function 
/ can be written as a sum of the beliefs. Let 6 be a vector of admissible beliefs for the 
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Figure 6: A factor graph for which the two-hop neighborhood of every node is not a tree. 



function /. Define — j = {1, n} \ {j}. For a fixed we can lower bound the optimum 
value of the objective function as follows: 



mm 



gj{x-j) + min 



Cjbj{Xj) + ^ Cq - CjbjiXj) 



add] 



gjix-j) +min (1 - ^ Ca)Cjbj{xj) + ^ Caba{Xa) 



(33) 

(34) 
(35) 



> gjix-j) +min[(l - ^ Ca)cjbj{xj)] + ^ mm[cabaixa)] (36) 



where gj{x-.j) is the part of the reparameterization that does not depend on Xj. The 
last inequality is tight whenever there is a value of Xj that simultaneously minimizes each 
component of the sum. If the coefficients of bj and each of the 6q, in Equation [36] were 
non-negative, then we could rewrite this bound as 

min/(a;) > gj{xs(j)) + (1 - Cq)cj min[6j(xj)] + rnin[6Q,(xa)] (37) 

which depends on the minima of each of the beliefs. Recall from Lemma [9] that any unique 
estimate must simultaneously minimize 6^, and, for % G o;, 6q — 6j. So, in general, we 
want to know if we can write 

(1 - y^ Ca)Cjbj{xj) + y^ Caba{Xa) (38) 
= djjbj{xj) + ^ daaba{Xa) + ^ dja[baiXa) " bj{xj)] (39) 

for some vector of non-negative constants d. This motivates the following definition: 

Definition 11 A function, h, can be written as a conical combination of the beliefs if 
there exists a vector of non-negative reals, d, such that 

h{x) = K+ ^ dia{ba{Xa) -hi{Xi)) + ^^daaha{Xa) + ^^diibi{xi) 
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The set of all conical combinations of a collection of vectors in TZ^ forms a cone in TZ"^ 
in the same way that a convex combination of vectors in TZ^ forms a conve x set in TZ 



The a bove definition is very similar to the definition of "provably convex" in IWeiss et al 



There, an entropy approximation is provably convex if it can be written as a conical 



combination of the entropy functions corresponding to each of the factors. In contrast, our 
approach follows from a reparameterization of the objective function. 

Putting all of the above ideas together, we have the following theorem: 

Theorem 12 Let b be a vector of admissible and min- consistent beliefs for the function 
f with a corresponding weighting vector of non-zero real numbers, c, such that for all i, 
aedi ba{xa) ~ Cibi[xi) Can be written as a conical combination of the beliefs. 
If the beliefs admit a unique estimate, x* , then x* is a local minimum (with respect to the 
Hamming distance) of the objective function. 

Proof By Lemma [71 the beliefs reparameterize the objective function. Choose a j € 
{1, n}. By assumption, the portion of the objective function that depends on the variable 
Xj can be written as a conical combination of the beliefs. By admissibility, we can write 

i a k£a 

= gjix*_j) + Cjbj{x*) +^Ca [baixl,) - Cjbj{x*)^ 

= 9jix*_j) + djjbj{x*) + ^ daaba{xD + ^ dja[ba{Xa) - bj{x*)] 

< 9j{x*_j) + djjbjiXj) + ^ daaba{xj,X*^\j) + dja[ba{Xj,X*^\j) - bj{xj)] 

= fiXj,X*_j) 

for any Xj E X where the inequality follows from Lemma [H We can repeat this proof for 
each j E {1, n}. ■ 

Theorem 1121 tells us that, under suitable choices of the parameters, no vector x within 
Hamming distance one of x* can decrease the objective function. For a differentiable func- 
tion /, we can infer that the gradient of / at the point x* must be zero. Further, by 
the second derivative test and the observation that the function can only increase in value 
along the coordinate axes, x* is either a local minimum or a saddle point of /. For a convex 
differentiable function /, this condition is equivalent to global optimality: 

Corollary 13 Let b be a vector of admissible and min- consistent beliefs for a differentiable 
convex function f with a corresponding weighting vector of non-zero real numbers, c, such 

that Mi, Cibi{xi) + Ylaedi^"' ^a{xa) — Cibi{xi) can be written as a conical combination of 
the beliefs. If the beliefs admit a unique estimate, x* , then x* is a global minimum of the 
objective function. 

Corollary 14 The standard min-sum algorithm with Q = 1 for all i and Cq, = 1 for all a 
always satisfies the conditions of Theorem UM. 
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4.2 Global Optimality 



We now extend the approach of the previous section to show that there are choices of 
the vector c that guarantee the global optimality of any unique estimate produced from 
admissible and min-consistent beliefs. As before, suppose 6 is a vector of admissible beliefs 
for the function /. If / can be written as a conical combination of the beliefs, then we can 
lower bound the optimal value of the objective function as follows: 



min/(x) = min 



K + ^ Cibi{xi) + ^ [^6a(2;a) - ^ Ckbk{xk) 



k£a 



mm 



K+ ^ dia{baiXa) - bi{Xi)) +^daaba{Xa) +^diibi{Xi) 



(40) 



(41) 



> K +y^dia miu{ba{xa) - bi{xi)) daa min6a(xa) da mm.bi{xi) (42) 

Xn Xn Xi 



This analysis provides us with our first global optimality result. W e note that the fol- 
lowing theorem a l so ap pears as Theorem 1 in iMeltzer et al. (|2009l ). and Theorem 2 in 
Wainwright et all ([2003) provides a similar proof for the TRMP algorithm. 



Theorem 15 Let b be a vector of admissible and min-consistent beliefs for the function 
f with a corresponding weighting vector of non-zero real numbers, c, such that f can be 
written as a conical combination of the beliefs. If the beliefs admit a unique estimate, x* , 
then X* minimizes the objective function. 

Proof Choose x G A'". Using Definition [TT] and Lemma [U we can write 

f{x*) = K+ ^ dia{ba{x*J -bi{x*)) + J2daMx*J + ^diibi{x*) (43) 

i,o:i£o a i 

< K+ ^ dia{baiXa) - bi{Xi)) +'^daaba{Xa) + '^diibi{xi) (44) 



fix) 



(45) 



Theorem [15] also provides us with a simple proof that the standard min-sum algorithm 
is correct on a tree: 

Corollary 16 Suppose the factor graph is a tree. If the admissible and min-consistent 
beliefs produced by the standard min-sum algorithm admit a unique estimate, x* , then x* is 
the global minimum of the objective function. 

Proof Let b be the vector of min-consistent and admissible beliefs obtained from running 
the standard min-sum algorithm. Choose a node r € G and consider the factor graph as a 
tree rooted at a variable node, r. Let p{a) denote the parent of factor node a (z G. We can 
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now write, 



fix) 



biixi) + ^ |^6Q(a;a) - ^ bkixk) 

br{Xr) + ^ \^baiXa) - bp(^a){Xp{a)) 



Hence, we can conclude that / can be written as a conical combination of the beliefs 
and apply Theorem [T5l ■ 



Given Theorem [T5l starting with the vector d seems slightly more natural than the 
starting with the vector c. Consider any non-negative real vector d, we now show that we 
can find a vector c such that / has a conical decomposition in terms of d provided d satisfies 
a mild condition. 

Choose the vector c as follows: 



daa ~\~ ^ ^ di. 



Ci 



dji J2a(^di ^i' 

1-E 



(46) 



(47) 



These equations are valid whenever 1 — Yla&di 7^ 0. Note that any valid reparame- 
terization must have Cj / and Cq, / for all a and i. Hence, daa + Z^iga '^ia 7^ and 
da 7^ Yla&di dia- Now, for this choice of c, we have: 



fix) 



K + ^ Cibi{Xi) + ^ Cq |^fea(x„) - ^ Ckbkixk) 



K + ^Ci(l 



Ca)bi{Xi) + ^ 

Co baiXa) 



+ ^(rfii — ^ dia)bi{ Xi) + y^((iQ,Q, + ^ ^ dia)baiXa) 



Ida 



K^'Y^diibiixi) ^^Y^daabotiXa) ^ ^ diaibai^ 



biiXi)) 



(48) 
(49) 
(50) 
(51) 



In the case that 1 — J2a€di ~ ^' chosen to be any non-zero real. Again, 

any valid reparameterization must have Cj 7^ and Cq, 7^ for all a and i. Hence, daa + 
Yliea 7^ 0, but, unlike the previous case, we must have da — J2aedi ~ "^^^ 
remainder of the proof then follows exactly as above. 

We now address the following question: given a factorization of the objective function 
/, is it always possible to choose the vector c in order to guarantee that any unique estimate 
produced from min-consistent and admissible beliefs minimizes the objective function? The 
answer to this question is yes, and we will provide a simple condition on the vector c that 
will ensure this. Again, suppose 6 is a vector of admissible beliefs for the function /. We 
can lower bound / as 



min fix) 



mm 



K + ^ Cib 



i^i) + X^Cq, [^6Q(xa) - y^Ckbkixk) 

a kea 



(52) 
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mm 

X 



Cq, 60 



(53) 



> K + ^ min[(l - ^ c„)ci6i(xi)] + ^ min[cQ,6a(a;a)] (54) 

Observe that if (1 — ^a^QiCa)Ci > for all i and Cq > for all a, then we can further 
rewrite the bound as: 

min/(x) > K + '^{1 - Ca)cimm[bi{xi)] + '^CaUim[ba{xa)] (55) 

i a£di a 

This analysis yields the following theorem: 

Theorem 17 Let b be a vector of admissible and min- consistent beliefs for the function f 
with a corresponding weighting vector of non-zero real numbers, c, such that 

1. For all i, (1 - Y^aedi '^a)ci > 

2. For all a, Ca > 

If the beliefs admit a unique estimate, x* , then x* minimizes the objective function. 
Proof By Lemma [71 up to a constant, we can write 

fi^) = '^Cibi{Xi) + '^Ca^ba{Xa) -'^CkbkiXk) 

i a kGa 

= X] [(1 - X] Ca)Cibi{Xi)j +y^^Caba{Xa) 
i a£di ex 

Now, by assumption, (1 — X^aeOi '^")'^« ^^"^ '^'^ non-negative real numbers. Therefore, 
by Lemma El we can conclude that the assignment x* simultaneously minimizes each of the 
beliefs and hence minimizes the function /. ■ 

This result is quite general; for any choice of c such that Cq > for all a G A, there 
exists a choice of q for each i such that the conditions of the above theorem are satisfied. 
The following corollary is an immediate consequence of this observation and Theorem [T71 

Corollary 18 Given any function f{x) = + Xlo V'a(^«)> there exists a choice of 

a non-zero parameter vector c such that for any vector of admissible and min- consistent 
beliefs for the function f , ilf the beliefs admit a unique estimate, x* , then this estimate 
minimizes f . 

Up until this point, we have been assuming that the estimate produced at the fixed 
point was unique when, in fact, all of the previous theorems are equally valid for any vector 
that simultaneously minimizes all of the beliefs. However, finding such a vector may be 
difficult outside of special cases. 
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Lastly, we note that there are choices of the parameters for which we are guaranteed 
local optimahty but not global optimality. The difference between Theorem [12] and Theorem 
[TCI is that the former only requires that the part of the reparameterization depending on 
a single variable can be written as a conical combination of the beliefs, whereas the latter 
requires the entire reparameterization to be a conical combination of the beliefs. The 
standard min-sum algorithm always guarantees local optimality, and there are applications 
for which the algor i thm i s known to produce local optima that are not globally optimal 



Weiss and FreemanI (j200lh . 



4.3 Relation to TRMP 

The TRMP algorithm is a special case of our algorithm. We consider the algorithm on a 
pairwise factor graph G with corresponding objective function /. Let T be the set of all 
spanning trees on G, and let ;U be a probability distribution over T. We define q = 1 for 
all i and Qj = Pr^[{i,j) G T] corresponding to the edge appearance probabilities. Let h be 
a vector of admissible and min-consistent beliefs for /. We can write the objective function 
/ as 



f{x) = ^ Cij bij{xi,Xj) - bi{xi) - bj{xj) 

i&G {i,j)<^G 



bi{xi) + ^ ^bij{xi, Xj) - bi{xi) - bj{xj) 



(56) 
(57) 



The remainder of this argument is now a generalized version of Corollary [TBI For each 
T G T, designate a variable node G T as the root of T. Let p^{a) denote the parent of 
factor node a G T. We can now write, 



fix) = Y^^iT) 



'^biixi) + ^ ^bij{xi,Xj) - bi{xi) - bj{xj) 



brj.{xrj.) + ^ [&ipT(j)(a;j, j;pT(j)) - 6pT(j)(XpT(j)) 



(58) 
(59) 



Because /i(T') > for all T G T, we can conclude that / can be written as a conical 
combination of the beliefs. By Theorem 1151 convergence of the TRMP algorithm to a unique 
estimate implies correctness. A similar argument can be made if is a distribution over all 
subgraphs of G containing at most one cycle. 

Computing the vector c for the TRMP algorithm requires finding a distribution on 
spanning trees. For arbitrary graphs, computing such a distribution is nontrivial especially 
if we want to preserve the distributed nature of the algorithm; we would need to compute 
enough spanning trees so that every edge is contained in at least one spanning tree. If 
we had simply chosen the Cij > such that for all i, X^fceOi ^fc* — ^' would have been 
guaranteed global optimality by Theorem [17] without the additional work of choosing a 
distribution over spanning trees. 
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4.4 Interpreting the Beliefs 

We conclude this section with an exploration of the following question: what is the re- 
lationship between min-consistent and admissible beliefs, b, and the true min- marginals? 
Let fi{xi) = lnmx_^ f{x). For the standard min-sum algorithm, when the factor graph 
is a tree and the initial messages are all chosen to be the zero message, we know that 
bi{xi) = fi{xi) + K. This is a direct consequence of the observation that the beliefs are the 
true min-marginals on the computation trees. 

Now, let 6 be a vector of beliefs that is admissible and min-consistent for the objective 
function /. If the parameter vector, c, satisfies the conditions of Theorem 1 12|, then we can 
lower bound the min-marginal fj as: 



min f{x) 



mm 



n + ^^Cibi{xi) + y^c, 



> min Qj (x^j) + min 



K + mm 



[&a(2;Q) - y^Ckbkjxk 

a k&a 
Cjbj{xj) + ^ Ca^baiXa) - Cjbj{xj) 



ba (s^a) 



djjbj{xj) + daj 

L a£dj 

> k' + djjbj{xj) + y^ daj min |^6Q,(xa 

= k" + djjbj{Xj) + ^ daabj{xj) 
= k" + {djj + ^ daa)bj{xj) 



hj{Xj)\^ + daaba{Xa) 



(60) 
(61) 
(62) 



bj{Xj 



+ 



daa min ba (Xa) (63) 

X-j 

(64) 
(65) 



In other words, fi is lower bounded by bi after an appropriate rescaling and shifting. A 
similar argument can be made if / can be written as a conical combination of the beliefs. 
This argument results in the following theorem: 

Theorem 19 Let b be a vector of admissible and min-consistent beliefs for the function f . 
If the parameter vector c satisfies the conditions for either global optimality as in Theorem 
[i5l or local optimality as in Theorem{T^ then for all i there exists di > and a constant Ki 
such that fi{xi) > + dibi{xi). 

Any collection of admissible and min-consistent beliefs, after an appropriate scaling and 
shifting, lower bound the true min-marginals. There are two special cases of this theorem 
worth noting: 

1. If c is the all ones vector, corresponding to the standard min-sum algorithm, then 
di = 1 for all i. Further, if the factor graph is a tree then the inequality is actually an 
equality. 

2. If Cq > for all a, Cj = 1 for all i, and "^aedi '^a for all i, then di = 1 for all i. 

In both of these cases, the scale factor is one, and as a result, up to a constant, any collection 
of admissible and min-consistent beliefs lower bound the true min-marginals. 
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5. Convergence 

Throughout the previous section, we assumed that we somehow obtained behefs that were 
both admissible and min-consistent. We know that, for bounded objective functions, behefs 
satisfying these two properties are produced by Algorithm [H if it converges, but thus far, we 
have avoided the issue of convergence. In order to apply the the results of Section HI we will 
need that the beliefs are converging to a collection of admissible and min-consistent beliefs. 
Traditionally, the min-sum algorithm is said to converge if the beliefs at two consecutive 
time steps do no change by more than some e > 0. There are two potential algorithmic 
behaviors that we would like to avoid: 

1. The beliefs do not converge to a vector of admissible and min-consistent beliefs. 

2. The vector of messages at time t is not finite as a result of performing the message 
updates. 

If the vector of messages is not finite, we may not be able to construct a corresponding 
vector of admissible beliefs as in Lemma [71 Unfortunately, there are distributions for which 
it is possible that the min-sum algorithm will generate infinite rn essages at some time step 
(e.g. the quadratic minimization problem Ruozzi and Tatikondal (.2010 )). However, case 2 



cannot arise for any distribution / such that 4>i and V'o are bounded real-valued functions 
for each a (z A and each i. 

In this section, we will focus on case 1. The standard approach to force convergence of 
the messages and beliefs is to apply a damping factor to the message updates. This tends to 
work well empirically, but choosing the correct damping factor remains somewhat of an art 
form. In what follows, we will pursue a different approach: modifying the message passing 
schedule to ensure convergence. 

5.1 Asynchronous Message Passing 

Consider the local and global optimality results of Section [H Each of these results relied on 
a particular lower bound on the objective function (o r at least part of the obje ctive function) 
being tight. Similar in spirit to the recent work of iKolmogorov hood ) and iMeltzer et al 



(|2009l ). we will present a message passing schedule that is guaranteed to improve a specific 



lower bound on the objective function at each time step. We will say that this algorithm 
has converged if the lower bound cannot be improved by subsequent iteration. 

Empirically, Algorithm[T]does not always converge, but the synchronous message passing 
schedule of Algorithm [1] is only one such schedule for the message updates. Consider the 
alternative message passing schedule in Algorithm [2l This asynchronous message passing 
schedule fixes an ordering on the variables and for each i, in order, updates all of the 
messages from each a G 5i to i as if i were the root of the subtree containing only a and its 
neighbors. We will show that, for certain choices of the parameter vector c, this message 
passing schedule improves a specific lower bound at each iteration. 

By using the asynchronous schedule in Algorithm [2l we seem to lose the distributed 
nature of the parallel message updates. Fortunately, for some asynchronous schedules, we 
can actually parallelize the updating process by performing concurrent updates as long as 
the simultaneous updates do not form a cycle (e.g. we could randomly select a subset 



23 



of the message updates that do not interfere) . We also note th at updating over l arger 
subtrees may be advantageo us. Other algorithms, such as those in Kolmogorov ( 20061 ) and 
Sontag and Jaakkola ( 20091 ) . perform updates over specific trees. 



Algorithm 2 Asynchronous Splitting Algorithm 



1: Initialize the messages uniformly to zero. 

2: Choose some ordering of the variables, and perform the following update for each vari- 
able i 

3: for each edge (j, /3) do 

4: For alH € /3 \ j update the message from i to /3 



5: Update the message from (3 to j 



mi3^j(Xj) = K + mm 



C/3 



6: end for 



We know that the ordinary min-sum algorithm is exact on trees. More specifically, given 
any choice of boundary messages, if we pass messages from the leaves to the root and back 
down on any subtree, then the beliefs over that subtree are guaranteed to be correct for that 
choice of boundary messages. In this case, we are guaranteed to produce a vector of beliefs 
that is min-consistent over that tree. For arbitrary choices of c, we are not guaranteed such 
a property on trees. Instead, we will show that, under a restriction on the choice of c, the 
message updates in Algorithm [2l which are performed sequentially over one variable node 
of the graph at a time, ensure that the beliefs satisfy a weak notion of consistency over 
the factors with respect to that variable. Our primary tool in this section will be a lemma 
similar to Lemma O 

Lemma 20 Let c he a vector such that q = 1 for all i. Suppose we perform the update for 
the edge as in Algorithmic If the vector of messages is finite after the update, then 

bj3 is min-consistent with respect to bj. 

Proof Let m be the vector of messages before the update and let m"*" be the vector of 
messages after the update. The proof of this lemma is similar to that of Lemma [8j Observe 
that for each i G /3 \ j, 

ml_^f^{Xi) = K + 4>i{Xi) + [cp - l)mp^i{xi) + ^ Cama^i{Xi) 

aedi\l3 

= K + 4>i{Xi) + {Cj3 - l)m+_^.(xj) + ^ Cam^^^{xi) 

a€di\l3 
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Similarly, 



mj^jixj) = K + mm — + 



C/3 



fcG/3\j 



Up to an additive constant, we have: 

V'/3(a;/3) 



in6+(x/3) 



mm I/O 



mm ■ 



+ 



mm h > ^ mJ Jxk) + 



m 



ae9j\/3 



^^3 



{xj)+ <i)j{xj)+ ^ c«m+_^^.(xj) + (c/3 - l)m+_^^.(xj) 



Observe that, after performing all of the message updates for a node j as in Algorithm 
[21 6^ is min-consistent with respect to hj for every 13 ^ A containing j. The most important 
conclusion we can draw from this is that there is an x*^ that simultaneously minimizes 6j, 

miua;^^^ 6^, and min^^^^ hp - hj. 

5.2 Global Convergence 

We will show that, under certain conditions, the vector of beliefs generated from the mes- 
sages at any point in Algorithm [2] are converging to a vector of admissible and min-consistent 
beliefs. We note that this theorem does not show convergence in the normal sense (i.e. the 
messages may not be converging to a fixed point of the message passing equations), but 
as the theorems from the previous section only require that the beliefs be admissible and 
min-consistent, this will suffice. 

As in the previous section, suppose m is a vector of finite messages with corresponding 
beliefs h. If / can be written as a conical combination of the beliefs, then we can lower 
bound the optimal value of the objective function as follows: 



min/(x) = min 



+ X dia{ha{Xa) -hi{Xi)) + Y^daaha{Xo,) + Y^diihi{a 



(66) 



> K+y^diamm{haixa) - hi{xi)) daa'aiinhaixa) +y^diimmhi{xi) (67) 

^ ' ^ ' ^ ' x^ 



i,a:i£a 



This lower bound is a function of the message vector, m. Further, if the beliefs satisfy 
the conditions of Theorem 1171 then we can write this lower bound as: 

min/(x) > K -|- 7 Cj(l — > Cq) min 6j(xi) -|- > Cq, min ^^(xq,) (68) 

X ^— ' ^— ' Xi ^—^ Xct 
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Define LB{m) to be the lower bound in Equation [68j Using Lemma [20l we can show 
that, for certain choices of the parameter vector, each variable update as in Algorithm [2] 
can only increase this lower bound: 

Theorem 21 Suppose q = 1 for all i, Ca > for all a, and X^^g^j Ca <: 1 for all i. If 
the vector of messages is finite at each step of the asynchronous splitting algorithm, then 
the algorithm converges to a collection of beliefs such that LB[m) cannot he improved by 
further iteration of the asynchronous algorithm. Further, if upon convergence of the lower 
bound the beliefs admit a unique estimate x* , then x* is the global minimum of the objective 
function. 

Proof We will show that the message updates performed for variable node j cannot 
decrease the lower bound. Let LBj{m) denote the terms in LB that involve the variable 
Xj. We can upper bound LBj as follows: 



LBj{m) < min[(l - ^ c/3)6j(xj) + ^ min ^/^(x/j)] 



= min[(l - Y + Y '^^ min[ ^^^^^^ + ^(&fc(a;fc) - m^^kixk))]] 

p&dj p&dj "^^^^ fcG/3 

= m.\n[(f)j{xj) + min[-^^^^^ + {bk{xk) - mp^k{xk))]] 

Notice that this upper bound does not depend on the choice of the messages from (3 to 
j for any /3 E As a result, any choice of these messages for which the inequality is 
tight must maximize LBj. Observe that the upper bound is tight iff there exists an Xj that 
simultaneously minimizes bj and min^;^^^. 6^ for each /3 G dj. By Lemma [20l this is indeed 
the case after performing the updates in Algorithm [2] for the variable node j. As this is the 
only part of the lower bound affected by the update, we have that LB cannot decrease. Let 
m be the vector of messages before the update for variable j and m"*" the vector after the 
update. Define LB-j to be the sum of the terms of the lower bound that do not involve 
the variable xj. By definition and the above we have: 

LB{m) = LB^j{m) + LBj{m) 

< LB^jim) + min[(l - ^ c/3)6j(xj) + ^ c/suiinbisixij)] 



LB_j{m+) +min[(l - ^ 0^3)6+ (xj) + ^ C/3 min6+(x/3)] 
LB(m+) 



LB{m) is bounded from above by min^; f{x). From this we can conclude that the value of 
the lower bound converges. 

Finally, the lower bound has converged if no single variable update can improve the 
bound. By the arguments above, this must mean that there exists an xj that simulta- 
neously minimizes bj and min^;^^^, 6^ for each /3 G dj. These beliefs may or may not be 



26 



min-consistent. Now, if there exists a unique minimizer x*, then x* must simultaneously 
minimize bj and miua;^^^. bp for each /3 G 5j. Prom this we can conclude that x* simultane- 
ously minimizes all of the beliefs and therefore, using the argument from Theorem [T71 must 
minimize the objective function. ■ 



We note that, even with this restricted choice of the parameter vector. Alg orithm [2l is not 



strictl y a member of the family of bound minimizing algorithms discussed in iMeltzer et al 
The disparity occurs because the definition of a bound minimizing algorithm as 
presented therein would require to be min-consistent with respect to Xj for all j G ct 
after the update is performed over the edge {i,a). Instead, Algorithm [2] only guarantees 
that ba is min-consistent with respect to Xi after the update. 

Although the restriction on the parameter vector in Theorem 12 1 1 seems strong, we observe 
that it captures the TRMP algorithm, and for any objective function /, we can choose 
the parameters such that the theorem is sufficient to guarantee convergence and global 
optimality: 

Corollary 22 Define d = maxj \di\. If c is chosen as in Theorem\2^ and the vector of 
messages is finite at each step of the asynchronous splitting algorithm, then the lower hound 
converges. Further, if the converged beliefs admit a unique estimate, then x* is a global 
minimum of f . 



W e note that Theoreml21lis similar to a special case of Theorem 1 in lHazan and Shashua 
\ ?ov an appropriate setting of the parameters in their algorithm, the message updates 
for the special case in Theorem [2T] are identical to those in Algorithm [21 and the two message 
passing algorithms differ only in the order of the updates. This difference turns out to be 
significant, and the proofs of convergence for the two algorithms are entirely different. 



5.3 Local Convergence 

Even if the objective function cannot be written as a conical combination of the beliefs, 
we can still use lower bounds on the objective function to explain the behavior of the 
asynchronous message passing algorithm. Suppose m is a vector of finite messages with 
corresponding beliefs b. If c is a vector of non-zero reals, such that for all i, Cibi{xi) + 

Tla&di ba{xa) — Cibi{xi) Can be written as a conical combination of the beliefs, then we 
can lower bound the optimal value of the objective function as follows: 



min f{xj,X- 



mm 

X 



K + ^ CibiiXi) + ^ Cq, - ^ Ckbk{Xk] 



mm 

X-j 



gj[x^j) + min 



Cjbj{xj) + ^ Ca[ba{Xa) " Cjbj{Xj)] 

> min gj{x-j) + djj mmbj{xj) + djaTam{ba{xa) — bj{xj)) 



(69) 



(70) 



aGdj 



+ daaT^i'^^baiXa) 



(71) 
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Applying Lemma [20] to this lower bound yields the following theorem: 



Theorem 23 Let c be a vector such that Cj = 1 for all i and Cq > for all a. If the vector 
of messages is finite at each step of the asynchronous splitting algorithm, then the message 
update for the variable node j in Algorithmic cannot decrease the lower bound in Equation 



We note that this theorem does not necessarily guarantee the convergence of the asyn- 
chronous message passing algorithm as the bound being improved is different for each 
variable. Hence, stronger assumptions are necessary in order to ensure convergence when 
the parameter vector only satisfies the conditions for local optimality. 

5.4 Tightness of the Lower Bounds 

Although Theorem [21] guarantees convergence, the lower bound in Equation [68] evaluated 
at the converged beliefs may not actually be tight. If the bound is tight, then any minimum 
of the objective function must simultaneously minimize each of the beliefs. In this section, 
we will provide a condition under which the lower bound cannot be tight. The intuition 
behind this condition is that the min-sum algorithm, in attempting to solve the minimization 
problem on one factor graph, is actually attempting to solve the minimization problem over 
an entire family of equivalent (in some sense) factor graphs. To make this more precise, we 
introduce the notion of a graph cover : 

Definition 24 A graph H covers a graph G if there exists a graph homomorphism (p : 
H G such that h : H ^ G is an isomorphism on dv for all vertices v (z H . If h(v) = u, 
then we say that v & H is a copy of u ^ G. Further, H is a k-cover of G if every vertex of 
G has exactly k copies in H . 

If H covers the factor graph G, then H has the same local properties as G. To any cover 
H we can associate a collection of potentials in which the potential at node i (z H is equal 
to the potential at node h{i) € G. The min-sum algorithm is incapable of distinguishing 
the two factor graphs H and G given that the initial messages to and from each node in H 
are identical to the nodes they cover in G. Observe that for every node v € G the messages 
received and sent by this node at time t are exactly the same as the messages sent and 
received at time t by any copy of v in H. As a result, if we use the min-sum algorithm 
to deduce an assignment for v, the algorithm run on the graph H must deduce the same 
assignment for each copy of v. We summarize this equivalence with the following lemma: 

Lemma 25 Let be a vector of beliefs that is admissible and min- consistent for a function 
f^ with corresponding factor graph G and a non-zero parameter vector c. If H is a graph 
cover of G via h : H ^ G with corresponding objective function f^, then the vector of 
beliefs b^ such that 



• For all a (z H , b[ 



• For all i e H, b^ 




is admissible and min- consistent for f 



H 
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In essence, the beliefs are simple copies of the beliefs on G. We could create an 
estimate from an estimate by duplicating the components in the same way as in the 
statement of the lemma. A minimum of the objective function need not be a copy of 
some minimum of the objective function f'^. Even worse, minima of need not correspond 
to minima of f^. This idea is th e basis for the theory of pseudocodewords in the LDPC 



community IVontobel and Koetteij (j2005l ) . In this community, solutions on covers that are 
not copies of solutions of the original problem are referred to as pseudocodewords. 

Theorem [15] guarantees us correctness upon convergence to a unique estimate under an 
appropriate choice of parameters. The same correctness argument can be made for any 
cover: 

Theorem 26 Let be a vector of beliefs that is admissible and min- consistent for a func- 
tion f^ with corresponding factor graph G and a non-zero parameter vector c such that f 
can be written as a conical combination of the beliefs. Suppose x'^ simultaneously minimizes 
each of the beliefs 5*^. If h : H ^ G is a covering homomorphism, then x^ defined such 
that xf = x^(fj is a global minimum of f^ . 

Proof Let H he a cover of G. By Lemma [25] we know that we can construct a vector of 
beliefs b^ that is admissible and min-consistent for f^ by copying beliefs from 6*^. If x'^ 
minimizes each of the beliefs in 6*^, then by construction, x^ minimizes all of the beliefs in 
b^ . We can then apply the proof of Theorem [15] to conclude that x^ is a global minimum 
of ■ 



From this theorem, we can conclude that if the objective function can be written as a 
conical combination of the beliefs, then unless there is an assignment that uniquely mini- 
mizes f'^ and f^ for any cover H of G, neither Algorithm [T] nor Algorithm [2] can converge 
to a unique estimate . 

6. Pairwise Binary Factorizations 

For special cases, we can strengthen and extend the results of the previous sections. One 
such special case is when the state space is binary (e.g. X = {0, 1}) and the objective 
function can be written as a sum of self-potentials and factor potentials that are a function of 
at most two of the variables. Many important problems can be captured by this restriction: 
the Ising model, the maximum weight independent set problem (MWIS), minimizing a 
quadratic function over a binary state space, etc. 

6.1 Partial Solutions 

In the previous sections, we examined the situation in which all of the bi had a unique 
argmin. As we discussed in the preliminaries, if this is not the case, then we may not 
always be able to extend these partial solutions into a global minimum. Suppose 6 is a 
vector of admissible beliefs for the function /. If / can be written as a conical combination 
of these beliefs, then, under certain conditions, we can demonstrate that the partial solution 
constructed from each i such that bi has a unique argmin can be extended to a global 
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minimum. This re sult is identical to the results of Kolmogorov and Wainwright ( 20051 ) and 
Weiss et alJ (|2007l ). but we will present it here for completeness: 



Theorem 27 For pairwise factorizations, if bj = constant for each variable j such that 
there is a factor containing j and some fixed variable i, then the partial assignment x*p given 
by the fixed variables can be extended to a global minimum of the objective function. 



Proo f See the corollary to Theorem 2 in I Weiss et al. I (|2007l 'l or Theorem 2 in lKolmogorov and Wainwright! 
(|2005l l. ■ 



6.2 Tightness of the Lower Bound 



Perhaps the most interesting property of pairwise binary factorizations is that the lower 
bound, LB, is maxi mized for any collection of adn ii ssible and min-consistent beliefs. This 
was demonstrated in iKolmogorov and Wainwrightl (j2005l ) , but we will extend their result 
and provide an alternate proof. Specifically, we will produce a 2-cover and an assignment 
that minimizes the objective function on that 2-cover. Hence, we will have that the LB is 
tight for the objective function on the 2-cover. By our arguments in Section 15.41 we know 
that no vector of min-consistent and admissible beliefs can do better than this. 



Theorem 28 Let b be a vector of admissible and min-consistent beliefs for the objective 
function fa with factor graph G obtained from a collection of messages m. If the graphical 
model is pairwise binary and the objective function can be written as a conical combination 
of the beliefs, then the beliefs maximize the lower bound, LB[m). 

Proof Without loss of generality we can assume that X = {0, 1}. Because the factors 
are pairwise, we can draw G as a graph with a node for each variable and an edge joining 
two nodes if there is a factor a = {i,j} € A. We will construct a 2-cover, of the factor 
graph G and an assignment such that x*^ minimizes fn- We will index the copies of 
variable i G G in the factor graph H as ii and i2. First, we will construct the assignment. If 
argmin^,. bi{xi) is unique, then set x*_^ = x*^ = argmin^;^ bi{xi). Otherwise, set x*^ = and 
= 1. Now, we will construct a 2-cover, H, such that x* minimizes each of the beliefs. 
We will do this factor by factor. Consider the factor a = {i,j} G A. There are several 
possibilities: 

1. bi and bj have unique argmins. In this case, ba is minimized at 5a(x|^,x*^). So, we 
can add the edges (ii, ji) and {12, j2) to H. The corresponding beliefs and 642^2 
are minimized at x*. 

2. bi has a unique argmin and bj is minimized at both and 1 (or vice versa). In this 
case, we have x*^ = x*^, x*^ = 0, and x*^ = 1. By min-consistency, we can conclude 
that ba is minimized at (x*^,0) and (x*^,l). Therefore, we can add the edges (ii,ji) 
and (12,^2) to H. 

3. bi and bj are minimized at both and 1. In this case, we have x*^ = 0, x*^ = 1, 
x*j_^ = 0, and x*j^ = 1. By min-consistency, there is an assignment that minimizes 
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ba with Xi = and an assignment that minimizes ba with Xi = 1. This means that 
argmin^^-^a;^ ba contains at least one of the sets {(0, 0), (1, 1)} and {(0, 1), (1, 0)}. For 
the first case, we can add the edges (ii, ji) and (i2,j2) to H, and in the second case, 
we can add (ii,j2) and {12, ji) to H. 



Consider the special case in which = 1 for all i. For a pairwise factorization, if none 
of the variable nodes in the factor graph are spht, then the message passing equations in 
Algorithm [1] simphfy to the following message update equation: 

mUj{xj) = min0,(xi) + ^^^^^^^ + (Qj-l)m*.I^\(x,)+ CHm*I^\(xi) (72) 



kedi\j 



Similarly, the beliefs can be simplified to: 



bl{xi) = (f>i{xi) + Ckiml^i{xi) (73) 

kedi 

b\j{xi,Xj) = '^^j(^^^'^j) _^ (t)i{xi) + 4>j(,Xj) + {cij - l)m*I^\(xi) + ^ Ckimlrl^{xi) 

kedi\j 

+ (cii-l)m*4(x,)+ Cfc,-m^.(xj) (74) 

k£dj\i 

Now, suppose that / admits a pairwise binary factorization and that c is a positive 
vector such that Xlfceai '^ik < 1 ^or all i. As a consequence of Theorem [28l the asynchronous 
splitting algorithm provides us with a convergent algorithm that is guaranteed to produce 
a solution on a 2-cover of the original problem. Moreover, as a consequence of the proof, we 
can construct such a 2-cover and the optimal assignment on this 2-cover in polynomial time 
given any vector of min-consistent and admissible beliefs. Hence, Theorem 1281 and Theorem 
\2T\ taken together, provide a nearly complete characterization of the behavior of the min- 
sum algorithm for this special case. Only the rate of convergence remains unresolved. 

7. Discussion 

The splitting technique described in this paper has a wide variety of applications. The 
proposed algorithm closely mirrors the traditional min-sum algorithm, but it can be proven 
to be both convergent and correct under appropriate message passing schedules. Most 
important is the simplicity of the derivation: a naive rewriting of the factorization. Other 
rewritings may produce even more families of message passing algorithms whose convergence 
and correctness properties may be even better than the splitting algorithm considered in 
this paper. 
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